Multi-level memory prefetching for media and stream processing

نویسنده

  • Jason Fritts
چکیده

This paper presents a multi-level memory prefetch hierarchy for media and stream processing applications. Two major bottlenecks in the performance of multimedia and network applications are long memory latencies and limited off-chip processor bandwidth. Aggressive prefetching can be used to mitigate the memory latency problem, but overly aggressive prefetching may overload the limited external processor bandwidth. To accommodate both problems, we propose multilevel memory prefetching. The multi-level organization enables conservative prefetching on-chip and more aggressive prefetching off-chip. The combination provides aggressive prefetching while minimally impacting off-chip bandwidth, enabling more efficient memory performance for media and stream processing. This paper presents preliminary results for multi-level memory prefetching, which show that combining prefetching at the L1 and DRAM memory levels provides the most effective prefetching with minimal extra bandwidth.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Timeliness for Hardware-based Prefetching from Main Memory

Among the techniques to hide or tolerate memory latency, data prefetching has been shown to be quite effective. However, this efficiency is often limited to prefetching into the first-level cache. With more aggressive architectural parameters in current and future processors, prefetching from main memory to the second-level (L2) cache becomes increasingly more important. In this paper, we exami...

متن کامل

Global-aware and multi-order context-based prefetching for high-performance processors

Data prefetching is widely used in high-end computing systems to accelerate data accesses and to bridge the increasing performance gap between processor and memory. Context-based prefetching has become a primary focus of study in recent years due to its general applicability. However, current context-based prefetchers only adopt the context analysis of a single order, which suffers from low pre...

متن کامل

Performance of Image and Video Processing with General-Purpose Processors and M

This paper aims to provide a quantitative understanding of the performance of image and video processing applications on general-purpose processors, without and with media ISA extensions. We use detailed simulation of 12 benchmarks to study the effectiveness of current architectural features and identify future challenges for these workloads. Our results show that conventional techniques in cur...

متن کامل

Hardware and software cache prefetching techniques for MPEG benchmarks

With the popularity of multimedia acceleration instructions such as MMX, MPEG decompression is increasingly executed on general purpose processors instead of dedicated MPEG hardware. The gap between processor speed and memory access means that a significant amount of time is spent in the memory system. As processors get faster—both in terms of higher clock speeds and increased instruction level...

متن کامل

Mechanisms to improve the efficiency of hardware data prefetchers

A well known performance bottleneck in computer architecture is the so-called memory wall. This term refers to the huge disparity between on-chip and off-chip access latencies. Historically speaking, the operating frequency of processors has increased at a steady pace, while most past advances in memory technology have been in density, not speed. Nowadays, the trend for ever increasing processo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002